54 research outputs found

    On structural and temporal credit assignment in reinforcement learning

    Get PDF
    Reinforcement learning, or learning how to map situations to actions that maximise a numerical reward signal, poses two fundamental interdependent problems: exploration and credit assignment. The exploration problem concerns an agent's ability to discover useful experiences. The credit assignment problem pertains to an agent's ability to incorporate the discovered experiences. The latter comprises two distinct subproblems itself: structural and temporal credit assignment. The structural credit assignment problem involves determining how to assign credit for the outcome of an action to the many component structures, or internal decisions, that could have been involved in producing that action. The temporal credit assignment problem has to do with determining how to assign credit for outcomes of a sequence of experiences to the actions that could have contributed to those outcomes. In this thesis, we broadly study the credit assignment problem in reinforcement learning, making contributions to each of its subproblems in isolation. In the first part of this thesis we address the reinforcement learning problem in environments with multi-dimensional discrete action spaces, a problem setting that plagues structural credit assignment, or generalisation, due to the Bellman's curse of dimensionality. We argue that leveraging the combinatorial structure of such action spaces is crucial for achieving rapid generalisation from limited data. To this end, we introduce two approaches for estimating action values that feature a capacity for leveraging such structures, in each case empirically validating that significant performance improvements in sample complexity can be gained. Furthermore, we demonstrate that our approaches unleash significant benefits concerning space and time complexity, thus allowing them to successfully scale to high-dimensional discrete action spaces where the conventional approach becomes computationally intractable. In the second part of this thesis we address the temporal credit assignment problem. Specifically, we identify and analyse general training scenarios where appropriate temporal credit assignment is hindered by the mishandling of time limits or by the choice of discount factor. To address the first matter, we formalise the ways in which time limits may be interpreted in reinforcement learning and how they should be handled in each case accordingly. To address the second matter, we produce a possible explanation for why the performance of low discount factors tends to fall flat when used in conjunction with function approximation. In turn, this leads us to develop a method that enables a much larger range of discount factors by rectifying the hypothesised root cause.Open Acces

    Exploring Restart Distributions

    Get PDF
    We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states associated with significant past experiences. This approach is compatible with both on-policy and off-policy methods. However, a caveat is that altering the distribution of initial states could change the optimal policies when searching within a restricted class of policies. To reduce this unsought learning bias, we evaluate our approach in deep reinforcement learning which benefits from the high representational capacity of deep neural networks. We instantiate three variants of our approach, each inspired by an idea in the context of experience replay. Using these variants, we show that performance gains can be achieved, especially in hard exploration problems.Comment: RLDM 201

    Occupant Privacy Perception, Awareness, and Preferences in Smart Office Environments

    Full text link
    Building management systems tout numerous benefits, such as energy efficiency and occupant comfort but rely on vast amounts of data from various sensors. Advancements in machine learning algorithms make it possible to extract personal information about occupants and their activities beyond the intended design of a non-intrusive sensor. However, occupants are not informed of data collection and possess different privacy preferences and thresholds for privacy loss. While privacy perceptions and preferences are most understood in smart homes, limited studies have evaluated these factors in smart office buildings, where there are more users and different privacy risks. To better understand occupants' perceptions and privacy preferences, we conducted twenty-four semi-structured interviews between April 2022 and May 2022 on occupants of a smart office building. We found that data modality features and personal features contribute to people's privacy preferences. The features of the collected modality define data modality features -- spatial, security, and temporal context. In contrast, personal features consist of one's awareness of data modality features and data inferences, definitions of privacy and security, and the available rewards and utility. Our proposed model of people's privacy preferences in smart office buildings helps design more effective measures to improve people's privacy

    R&D Management in Iran, Opportunities and Threats

    Get PDF
    Research and Development (R&D) management in Iran has faced many barriers and obstacles, in which R&D units are considered as the basic core of the product development and innovation. Due to structural shortcomings, a great number of organizations and industries have not yet been able to position themselves in the market. There are about 1141 R&D units throughout Iran, due to the geographical decentralization of these units this paper considers and analyzes the R&D case study in one of the provinces located in the north part of Iran, and the findings can be generalized to the other industrialized areas and zones in Iran. In this province, there are about 2504 industrial units of which there are only 44 R&D units certified by the state government. However, there is limited number of these R&D units that are extensively active. This paper also addresses the current status in respect with the R&D activities to find out why little attention has been paid to these activities in the industrial units. Considering the opportunities and challenges of these R&D units reveals that there is a need to activate these units so that they can quickly respond to changes in the market. Finally, a few alternative solutions and improvement plans are proposed, in which the Iranian R&D Society is responsible for supporting and fostering these action plans towards the organization goals. The research methodology was based on a previous field research conducted in Hamedan province, and after the analysis of the research results, a model for the efficiency of R&D units will be presented.R&D management, R&D Society, Innovation, Industrial sectors

    A Multimodal Approach for Monitoring Driving Behavior and Emotions

    Get PDF
    Studies have indicated that emotions can significantly be influenced by environmental factors; these factors can also significantly influence drivers’ emotional state and, accordingly, their driving behavior. Furthermore, as the demand for autonomous vehicles is expected to significantly increase within the next decade, a proper understanding of drivers’/passengers’ emotions, behavior, and preferences will be needed in order to create an acceptable level of trust with humans. This paper proposes a novel semi-automated approach for understanding the effect of environmental factors on drivers’ emotions and behavioral changes through a naturalistic driving study. This setup includes a frontal road and facial camera, a smart watch for tracking physiological measurements, and a Controller Area Network (CAN) serial data logger. The results suggest that the driver’s affect is highly influenced by the type of road and the weather conditions, which have the potential to change driving behaviors. For instance, when the research defines emotional metrics as valence and engagement, results reveal there exist significant differences between human emotion in different weather conditions and road types. Participants’ engagement was higher in rainy and clear weather compared to cloudy weather. More-over, engagement was higher on city streets and highways compared to one-lane roads and two-lane highways

    How do Environmental Factors Affect Drivers’ Gaze and Head Movements?

    Get PDF
    Studies have shown that environmental factors affect driving behaviors. For instance, weather conditions and the presence of a passenger have been shown to significantly affect the speed of the driver. As one of the important measures of driving behavior is the gaze and head movements of the driver, such metrics can be potentially used towards understanding the effects of environmental factors on the driver’s behavior in real-time. In this study, using a naturalistic study platform, videos have been collected from six participants for more than four weeks of a fully naturalistic driving scenario. The videos of both the participants’ faces and roads have been cleaned and manually categorized depending on weather, road type, and passenger conditions. Facial videos have been analyzed using OpenFace to retrieve the gaze direction and head movements of the driver. Results, overall, suggest that the gaze direction and head movements of the driver are affected by a combination of environmental factors and individual differences. Specifically, results depict the distracting effect of the passenger on some individuals. In addition, it shows that highways and city streets are the cause for maximum distraction on the driver’s gaze
    corecore